Saint Joseph
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems
Zhang, Yuxin, Wang, Yan, Chen, Yongrui, Zhang, Shenyu, Dai, Xinbang, Bi, Sheng, Qi, Guilin
Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external retrieved information, mitigating issues such as hallucination and outdated knowledge. However, RAG systems are highly sensitive to retrieval noise prevalent in real-world scenarios. Existing benchmarks fail to emulate the complex and heterogeneous noise distributions encountered in real-world retrieval environments, undermining reliable robustness assessment. In this paper, we define four categories of retrieval noise based on linguistic properties and noise characteristics, aiming to reflect the heterogeneity of noise in real-world scenarios. Building on this, we introduce Magic Mushroom, a benchmark for replicating "magic mushroom" noise: contexts that appear relevant on the surface but covertly mislead RAG systems. Magic Mushroom comprises 7,468 single-hop and 3,925 multi-hop question-answer pairs. More importantly, Magic Mushroom enables researchers to flexibly configure combinations of retrieval noise according to specific research objectives or application scenarios, allowing for highly controlled evaluation setups. We evaluate LLM generators of varying parameter scales and classic RAG denoising strategies under diverse noise distributions to investigate their performance dynamics during progressive noise encroachment. Our analysis reveals that both generators and denoising strategies have significant room for improvement and exhibit extreme sensitivity to noise distributions. Magic Mushroom emerges as a promising tool for evaluating and advancing noise-robust RAG systems, accelerating their widespread deployment in real-world applications. The Magic Mushroom benchmark is available at https://drive.google.com/file/d/1aP5kyPuk4L-L_uoI6T9UhxuTyt8oMqjT/view?usp=sharing.
- Europe > Austria > Vienna (0.14)
- Africa > Zimbabwe (0.04)
- Africa > Southern Africa (0.04)
- (20 more...)
Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs
Chandra, Rohitash, Ren, Guoxiang, Group-H, null
Over the past decades, there has been an increasing concern about the prevalence of abusive and violent content in Hollywood movies. This study uses Large Language Models (LLMs) to explore the longitudinal abuse and sentiment analysis of Hollywood Oscar and blockbuster movie dialogues from 1950 to 2024. By employing fine-tuned LLMs, we analyze subtitles for over a thousand movies categorised into four genres to examine the trends and shifts in emotional and abusive content over the past seven decades. Our findings reveal significant temporal changes in movie dialogues, which reflect broader social and cultural influences. Overall, the emotional tendencies in the films are diverse, and the detection of abusive content also exhibits significant fluctuations. The results show a gradual rise in abusive content in recent decades, reflecting social norms and regulatory policy changes. Genres such as thrillers still present a higher frequency of abusive content that emphasises the ongoing narrative role of violence and conflict. At the same time, underlying positive emotions such as humour and optimism remain prevalent in most of the movies. Furthermore, the gradual increase of abusive content in movie dialogues has been significant over the last two decades, where Oscar-nominated movies overtook the top ten blockbusters.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Barbados > Saint Joseph > Bathsheba (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
How an Iowa School District Used ChatGPT to Ban Books
For bookworms, reading a headline like "School District Uses ChatGPT to Help Remove Library Books" can be blood boiling. As Vulture put it earlier this week, it creates the sense that the artificial intelligence tool is once again "[taking] out its No. 1 enemy: original work." Using ChatGPT's guidance, the Mason City Community School District removed 19 titles--including Margaret Atwood's The Handmaid's Tale and Toni Morrison's Beloved--from its library shelves. But there is another truth: Educators who must comply with vague laws about "age-appropriate" books with "descriptions or visual depictions of a sex act" have only so many options. Signed into law by Governor Kim Reynolds in May, Iowa's SF 496 is one of those "parental rights" bills that have become popular with Republican lawmakers of late and seek to limit discussion of sexuality and gender identity in schools.
- North America > United States > Iowa (0.66)
- North America > Barbados > Saint Joseph > Bathsheba (0.06)
SNaC: Coherence Error Detection for Narrative Summarization
Goyal, Tanya, Li, Junyi Jessy, Durrett, Greg
Progress in summarizing long texts is inhibited by the lack of appropriate evaluation frameworks. When a long summary must be produced to appropriately cover the facets of that text, that summary needs to present a coherent narrative to be understandable by a reader, but current automatic and human evaluation methods fail to identify gaps in coherence. In this work, we introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries. We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries. Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators. Furthermore, we show that the collected annotations allow us to train a strong classifier for automatically localizing coherence errors in generated summaries as well as benchmarking past work in coherence modeling. Finally, our SNaC framework can support future work in long document summarization and coherence evaluation, including improved summarization modeling and post-hoc summary correction.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > New York (0.04)
- North America > Barbados > Saint Joseph > Bathsheba (0.04)
- (2 more...)
- Information Technology (0.67)
- Leisure & Entertainment (0.48)
- Media > Film (0.34)
DeepPainter: Painter Classification Using Deep Convolutional Autoencoders
David, Eli, Netanyahu, Nathan S.
In this paper we describe the problem of painter classification, and propose a novel approach based on deep convolutional autoencoder neural networks. While previous approaches relied on image processing and manual feature extraction from paintings, our approach operates on the raw pixel level, without any preprocessing or manual feature extraction. We first train a deep convolutional autoencoder on a dataset of paintings, and subsequently use it to initialize a supervised convolutional neural network for the classification phase. The proposed approach substantially outperforms previous methods, improving the previous state-of-the-art for the 3-painter classification problem from 90.44% accuracy (previous state-of-the-art) to 96.52% accuracy, i.e., a 63% reduction in error rate.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > Barbados > Saint Joseph > Bathsheba (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)